Introduction

Code

Stata

The webuse command can load the directly.

. webuse margex
(Artificial data for margins)

. summarize, sep(0)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           y |      3,000    69.73357    21.53986          0      146.3
     outcome |      3,000    .1696667    .3754023          0          1
         sex |      3,000    .5006667    .5000829          0          1
       group |      3,000       1.828    .7732714          1          3
         age |      3,000      39.799    11.54174         20         60
    distance |      3,000    58.58566    181.3987        .25   748.8927
         ycn |      3,000    74.47263    22.01264          0      153.8
          yc |      3,000    .2413333    .4279633          0          1
   treatment |      3,000    .5006667    .5000829          0          1
    agegroup |      3,000    2.263667    .8316913          1          3
         arm |      3,000    1.970667    .7899457          1          3

R

We’ll load the data from Stata’s website via the haven package. We’ll also convert to standard R data.frame and factors.

library(haven)
m <- haven::read_dta("http://www.stata-press.com/data/r15/margex.dta")
m <- data.frame(m)
m$sex <- as_factor(m$sex)
m$group <- as_factor(m$group)
head(m)
      y outcome    sex group age distance   ycn yc treatment agegroup arm
1 102.6       1 female     1  55    11.75  97.3  1         1        3   1
2  77.2       0 female     1  60    30.75  78.6  0         1        3   1
3  80.5       0 female     1  27    14.25  54.5  0         1        1   1
4  82.5       1 female     1  60    29.25  98.4  1         1        3   1
5  71.6       1 female     1  52    19.00 105.3  1         1        3   1
6  83.2       1 female     1  49     2.50  89.7  0         1        3   1

Categorical Variables

Code

Stata

. regress y i.group

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(2, 2997)      =     15.48
       Model |  14229.0349         2  7114.51743   Prob > F        =    0.0000
    Residual |  1377203.97     2,997  459.527518   R-squared       =    0.0102
-------------+----------------------------------   Adj R-squared   =    0.0096
       Total |  1391433.01     2,999  463.965657   Root MSE        =    21.437

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          2  |   .4084991   .8912269     0.46   0.647    -1.338979    2.155977
          3  |   5.373138   1.027651     5.23   0.000     3.358165     7.38811
             |
       _cons |   68.35805   .6190791   110.42   0.000     67.14419    69.57191
------------------------------------------------------------------------------

. margins group

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          1  |   68.35805   .6190791   110.42   0.000     67.14419    69.57191
          2  |   68.76655   .6411134   107.26   0.000     67.50948    70.02361
          3  |   73.73119   .8202484    89.89   0.000     72.12288    75.33949
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ group, data = m))

Call:
lm(formula = y ~ group, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-73.531 -14.758   0.101  14.536  72.569 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  68.3580     0.6191 110.419  < 2e-16 ***
group2        0.4085     0.8912   0.458    0.647    
group3        5.3731     1.0277   5.229 1.83e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared:  0.01023,   Adjusted R-squared:  0.009566 
F-statistic: 15.48 on 2 and 2997 DF,  p-value: 2.045e-07
emmeans::emmeans(mod1, ~ group)
 group emmean    SE   df lower.CL upper.CL
 1       68.4 0.619 2997     67.1     69.6
 2       68.8 0.641 2997     67.5     70.0
 3       73.7 0.820 2997     72.1     75.3

Confidence level used: 0.95 

Math

Assume a linear regression set up with a single categorical variable, \(G\), with three groups. We fit

\[ E(Y|G) = \beta_0 + \beta_1g_2 + \beta_2g_3, \]

where \(g_2\) and \(g_3\) are dummy variables representing membership in groups 2 and 3 respectively (\(g_{2i} = 1\) if observation \(i\) has \(G = 2\), and equal to 0 otherwise.) Since \(g_1\) is not found in the model, it is the reference category.

Therefore, \(\beta_0\) represents the average response among the reference category \(G = 1\). \(\beta_1\) represents the difference in the average response between groups \(G = 1\) and \(G = 2\). Therefore, \(\beta_0 + \beta_1\) is the average response in group \(G = 2\). A similar argument can be made about \(\beta_2\) and group 3.

Interactions with Categorical Variables

Code

Stata

. regress y i.sex##i.group

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(5, 2994)      =     92.91
       Model |  186898.199         5  37379.6398   Prob > F        =    0.0000
    Residual |  1204534.81     2,994  402.316235   R-squared       =    0.1343
-------------+----------------------------------   Adj R-squared   =    0.1329
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.058

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   21.62507   1.509999    14.32   0.000     18.66433    24.58581
             |
       group |
          2  |   11.37444   1.573314     7.23   0.000     8.289552    14.45932
          3  |    21.6188   1.588487    13.61   0.000     18.50416    24.73344
             |
   sex#group |
   female#2  |  -4.851582   1.942744    -2.50   0.013     -8.66083   -1.042333
   female#3  |  -6.084875   3.004638    -2.03   0.043    -11.97624   -.1935115
             |
       _cons |    50.6107   1.367932    37.00   0.000     47.92852    53.29288
------------------------------------------------------------------------------

. margins sex

Predictive margins                              Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
       male  |   59.77145   .6454388    92.61   0.000      58.5059      61.037
     female  |   78.20318   .7105463   110.06   0.000     76.80997    79.59639
------------------------------------------------------------------------------

. margins sex#group

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   sex#group |
     male#1  |    50.6107   1.367932    37.00   0.000     47.92852    53.29288
     male#2  |   61.98514   .7772248    79.75   0.000     60.46119    63.50908
     male#3  |    72.2295   .8074975    89.45   0.000     70.64619     73.8128
   female#1  |   72.23577     .63942   112.97   0.000     70.98203    73.48952
   female#2  |   78.75863   .9434406    83.48   0.000     76.90877    80.60849
   female#3  |    87.7697   2.468947    35.55   0.000     82.92869     92.6107
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ sex*group, data = m))

Call:
lm(formula = y ~ sex * group, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-72.029 -13.285   0.464  13.741  67.964 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        50.611      1.368  36.998  < 2e-16 ***
sexfemale          21.625      1.510  14.321  < 2e-16 ***
group2             11.374      1.573   7.230 6.12e-13 ***
group3             21.619      1.588  13.610  < 2e-16 ***
sexfemale:group2   -4.852      1.943  -2.497   0.0126 *  
sexfemale:group3   -6.085      3.005  -2.025   0.0429 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared:  0.1343,    Adjusted R-squared:  0.1329 
F-statistic: 92.91 on 5 and 2994 DF,  p-value: < 2.2e-16
emmeans::emmeans(mod1, ~ group)
 group emmean    SE   df lower.CL upper.CL
 1       61.4 0.755 2994     59.9     62.9
 2       70.4 0.611 2994     69.2     71.6
 3       80.0 1.299 2994     77.5     82.5

Results are averaged over the levels of: sex 
Confidence level used: 0.95 
emmeans::emmeans(mod1, ~ group + sex)
 group sex    emmean    SE   df lower.CL upper.CL
 1     male     50.6 1.368 2994     47.9     53.3
 2     male     62.0 0.777 2994     60.5     63.5
 3     male     72.2 0.807 2994     70.6     73.8
 1     female   72.2 0.639 2994     71.0     73.5
 2     female   78.8 0.943 2994     76.9     80.6
 3     female   87.8 2.469 2994     82.9     92.6

Confidence level used: 0.95 

Math

Consider the simplest case, with two binary variables \(Z\) and \(K\). We fit the model

\[ E(Y|Z,K) = \beta_0 + \beta_1Z + \beta_2K + \beta_3ZK, \]

where \(ZK = 0\) only if both \(Z = 1\) and \(K = 1\).

This is functionally equivalent to defining a new variable \(L\),

\[ L = \begin{cases} 0, & K = 0 \& Z = 0 \\ 1, & K = 0 \& Z = 1 \\ 2, & K = 1 \& Z = 0 \\ 3, & K = 1 \& Z = 1, \end{cases} \]

and fitting the model

\[ E(Y|L) = \beta_0 + \beta_1l_1 + \beta_2l_2 + \beta_3l_3. \]

Categorical and Continuous Variables

Code

Stata

. regress y i.sex c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(2, 2997)      =    209.88
       Model |  170938.657         2  85469.3283   Prob > F        =    0.0000
    Residual |  1220494.35     2,997  407.238688   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1223
       Total |  1391433.01     2,999  463.965657   Root MSE        =     20.18

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.03271   .7777377    18.04   0.000     12.50775    15.55766
         age |  -.5043666    .033698   -14.97   0.000    -.5704402   -.4382931
       _cons |   82.78115   1.323613    62.54   0.000     80.18586    85.37643
------------------------------------------------------------------------------

. margins sex, at(age = (30 40))

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : age             =          30

2._at        : age             =          40

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     _at#sex |
     1#male  |   67.65015   .5604889   120.70   0.000     66.55116    68.74913
   1#female  |   81.68285   .6911128   118.19   0.000     80.32775    83.03796
     2#male  |   62.60648    .537682   116.44   0.000     61.55222    63.66074
   2#female  |   76.63919    .533784   143.58   0.000     75.59257    77.68581
------------------------------------------------------------------------------

R

For each unique level of the continuous variable, a seperate emmeans call is needed apparently. If anyone knows a way around this, please let me know.

summary(mod1 <- lm(y ~ sex + age, data = m))

Call:
lm(formula = y ~ sex + age, data = m)

Residuals:
   Min     1Q Median     3Q    Max 
-71.99 -13.88  -0.15  13.76  75.28 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  82.7811     1.3236   62.54   <2e-16 ***
sexfemale    14.0327     0.7777   18.04   <2e-16 ***
age          -0.5044     0.0337  -14.97   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2997 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.1223 
F-statistic: 209.9 on 2 and 2997 DF,  p-value: < 2.2e-16
emmeans::emmeans(mod1, ~ sex, at = list(age = 30))
 sex    emmean    SE   df lower.CL upper.CL
 male     67.7 0.560 2997     66.6     68.7
 female   81.7 0.691 2997     80.3     83.0

Confidence level used: 0.95 
emmeans::emmeans(mod1, ~ sex, at = list(age = 40))
 sex    emmean    SE   df lower.CL upper.CL
 male     62.6 0.538 2997     61.6     63.7
 female   76.6 0.534 2997     75.6     77.7

Confidence level used: 0.95 

Categorical and Continuous Interactions

Code

Stata

. regress y i.sex##c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(3, 2996)      =    139.91
       Model |  170983.675         3  56994.5583   Prob > F        =    0.0000
    Residual |  1220449.33     2,996   407.35959   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1220
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.183

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.92308   2.789012     5.35   0.000     9.454508    20.39165
         age |  -.4929608   .0480944   -10.25   0.000    -.5872622   -.3986595
             |
   sex#c.age |
     female  |  -.0224116   .0674167    -0.33   0.740    -.1545994    .1097762
             |
       _cons |   82.36936   1.812958    45.43   0.000      78.8146    85.92413
------------------------------------------------------------------------------

. margins sex, at(age = (30 40))

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : age             =          30

2._at        : age             =          40

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     _at#sex |
     1#male  |   67.58054   .5984013   112.94   0.000     66.40722    68.75386
   1#female  |   81.83127   .8228617    99.45   0.000     80.21784     83.4447
     2#male  |   62.65093   .5541361   113.06   0.000     61.56441    63.73746
   2#female  |   76.67755   .5461908   140.39   0.000      75.6066    77.74849
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ sex*age, data = m))

Call:
lm(formula = y ~ sex * age, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-71.817 -13.848  -0.116  13.758  75.263 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   82.36936    1.81296  45.434  < 2e-16 ***
sexfemale     14.92308    2.78901   5.351 9.42e-08 ***
age           -0.49296    0.04809 -10.250  < 2e-16 ***
sexfemale:age -0.02241    0.06742  -0.332     0.74    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.122 
F-statistic: 139.9 on 3 and 2996 DF,  p-value: < 2.2e-16
emmeans::emmeans(mod1, ~ sex, at = list(age = 30))
 sex    emmean    SE   df lower.CL upper.CL
 male     67.6 0.598 2996     66.4     68.8
 female   81.8 0.823 2996     80.2     83.4

Confidence level used: 0.95 
emmeans::emmeans(mod1, ~ sex, at = list(age = 40))
 sex    emmean    SE   df lower.CL upper.CL
 male     62.7 0.554 2996     61.6     63.7
 female   76.7 0.546 2996     75.6     77.7

Confidence level used: 0.95 

No Interactions

Code

Stata

. regress y i.sex c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(2, 2997)      =    209.88
       Model |  170938.657         2  85469.3283   Prob > F        =    0.0000
    Residual |  1220494.35     2,997  407.238688   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1223
       Total |  1391433.01     2,999  463.965657   Root MSE        =     20.18

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.03271   .7777377    18.04   0.000     12.50775    15.55766
         age |  -.5043666    .033698   -14.97   0.000    -.5704402   -.4382931
       _cons |   82.78115   1.323613    62.54   0.000     80.18586    85.37643
------------------------------------------------------------------------------

. margins, dydx(age)

Average marginal effects                        Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.5043666    .033698   -14.97   0.000    -.5704402   -.4382931
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ sex + age, data = m))

Call:
lm(formula = y ~ sex + age, data = m)

Residuals:
   Min     1Q Median     3Q    Max 
-71.99 -13.88  -0.15  13.76  75.28 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  82.7811     1.3236   62.54   <2e-16 ***
sexfemale    14.0327     0.7777   18.04   <2e-16 ***
age          -0.5044     0.0337  -14.97   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2997 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.1223 
F-statistic: 209.9 on 2 and 2997 DF,  p-value: < 2.2e-16
emmeans::emtrends(mod1, ~ 1, var = "age")
 1       age.trend     SE   df lower.CL upper.CL
 overall    -0.504 0.0337 2997    -0.57   -0.438

Results are averaged over the levels of: sex 
Confidence level used: 0.95 

Interactions

Code

Stata

. regress y i.sex##c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(3, 2996)      =    139.91
       Model |  170983.675         3  56994.5583   Prob > F        =    0.0000
    Residual |  1220449.33     2,996   407.35959   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1220
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.183

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.92308   2.789012     5.35   0.000     9.454508    20.39165
         age |  -.4929608   .0480944   -10.25   0.000    -.5872622   -.3986595
             |
   sex#c.age |
     female  |  -.0224116   .0674167    -0.33   0.740    -.1545994    .1097762
             |
       _cons |   82.36936   1.812958    45.43   0.000      78.8146    85.92413
------------------------------------------------------------------------------

. margins sex, dydx(age)

Average marginal effects                        Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age          |
         sex |
       male  |  -.4929608   .0480944   -10.25   0.000    -.5872622   -.3986595
     female  |  -.5153724   .0472435   -10.91   0.000    -.6080054   -.4227395
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ sex*age, data = m))

Call:
lm(formula = y ~ sex * age, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-71.817 -13.848  -0.116  13.758  75.263 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   82.36936    1.81296  45.434  < 2e-16 ***
sexfemale     14.92308    2.78901   5.351 9.42e-08 ***
age           -0.49296    0.04809 -10.250  < 2e-16 ***
sexfemale:age -0.02241    0.06742  -0.332     0.74    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.122 
F-statistic: 139.9 on 3 and 2996 DF,  p-value: < 2.2e-16
emmeans::emtrends(mod1, ~ sex, var = "age")
 sex    age.trend     SE   df lower.CL upper.CL
 male      -0.493 0.0481 2996   -0.587   -0.399
 female    -0.515 0.0472 2996   -0.608   -0.423

Confidence level used: 0.95 

Categorical Variables

Code

Stata

. regress y i.group

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(2, 2997)      =     15.48
       Model |  14229.0349         2  7114.51743   Prob > F        =    0.0000
    Residual |  1377203.97     2,997  459.527518   R-squared       =    0.0102
-------------+----------------------------------   Adj R-squared   =    0.0096
       Total |  1391433.01     2,999  463.965657   Root MSE        =    21.437

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          2  |   .4084991   .8912269     0.46   0.647    -1.338979    2.155977
          3  |   5.373138   1.027651     5.23   0.000     3.358165     7.38811
             |
       _cons |   68.35805   .6190791   110.42   0.000     67.14419    69.57191
------------------------------------------------------------------------------

. margins group, pwcompare(pv)

Pairwise comparisons of adjusted predictions    Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

-----------------------------------------------------
             |            Delta-method    Unadjusted
             |   Contrast   Std. Err.      t    P>|t|
-------------+---------------------------------------
       group |
     2 vs 1  |   .4084991   .8912269     0.46   0.647
     3 vs 1  |   5.373138   1.027651     5.23   0.000
     3 vs 2  |   4.964638   1.041073     4.77   0.000
-----------------------------------------------------

R

Note that pairs is a generic, meaning that emmeans::pairs is an invalid call. Despite this, the “emmeans” package is required to call the below.

summary(mod1 <- lm(y ~ group, data = m))

Call:
lm(formula = y ~ group, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-73.531 -14.758   0.101  14.536  72.569 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  68.3580     0.6191 110.419  < 2e-16 ***
group2        0.4085     0.8912   0.458    0.647    
group3        5.3731     1.0277   5.229 1.83e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared:  0.01023,   Adjusted R-squared:  0.009566 
F-statistic: 15.48 on 2 and 2997 DF,  p-value: 2.045e-07
pairs(emmeans::emmeans(mod1, ~ group), adjust = "none")
 contrast estimate    SE   df t.ratio p.value
 1 - 2      -0.408 0.891 2997 -0.458  0.6467 
 1 - 3      -5.373 1.028 2997 -5.229  <.0001 
 2 - 3      -4.965 1.041 2997 -4.769  <.0001 

The adjust = "none" argument skips any multiple comparisons correction. This is done to obtain the same results as the regression coefficients. If you’d prefer a more ANOVA post-hoc approach, there are other options for adjust, see ?emmeans::contrast for details.

Categorical Interactions

Column

Stata

. regress y i.group##i.sex

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(5, 2994)      =     92.91
       Model |  186898.199         5  37379.6398   Prob > F        =    0.0000
    Residual |  1204534.81     2,994  402.316235   R-squared       =    0.1343
-------------+----------------------------------   Adj R-squared   =    0.1329
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.058

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          2  |   11.37444   1.573314     7.23   0.000     8.289552    14.45932
          3  |    21.6188   1.588487    13.61   0.000     18.50416    24.73344
             |
         sex |
     female  |   21.62507   1.509999    14.32   0.000     18.66433    24.58581
             |
   group#sex |
   2#female  |  -4.851582   1.942744    -2.50   0.013     -8.66083   -1.042333
   3#female  |  -6.084875   3.004638    -2.03   0.043    -11.97624   -.1935115
             |
       _cons |    50.6107   1.367932    37.00   0.000     47.92852    53.29288
------------------------------------------------------------------------------

. margins group#sex, pwcompare(pv)

Pairwise comparisons of adjusted predictions    Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------
                          |            Delta-method    Unadjusted
                          |   Contrast   Std. Err.      t    P>|t|
--------------------------+---------------------------------------
                group#sex |
  (1#female) vs (1#male)  |   21.62507   1.509999    14.32   0.000
    (2#male) vs (1#male)  |   11.37444   1.573314     7.23   0.000
  (2#female) vs (1#male)  |   28.14793   1.661722    16.94   0.000
    (3#male) vs (1#male)  |    21.6188   1.588487    13.61   0.000
  (3#female) vs (1#male)  |     37.159   2.822577    13.16   0.000
  (2#male) vs (1#female)  |  -10.25064   1.006447   -10.18   0.000
(2#female) vs (1#female)  |   6.522856    1.13971     5.72   0.000
  (3#male) vs (1#female)  |  -.0062748   1.030005    -0.01   0.995
(3#female) vs (1#female)  |   15.53392   2.550404     6.09   0.000
  (2#female) vs (2#male)  |   16.77349   1.222358    13.72   0.000
    (3#male) vs (2#male)  |   10.24436   1.120772     9.14   0.000
  (3#female) vs (2#male)  |   25.78456   2.588393     9.96   0.000
  (3#male) vs (2#female)  |  -6.529131   1.241826    -5.26   0.000
(3#female) vs (2#female)  |   9.011069   2.643063     3.41   0.001
  (3#female) vs (3#male)  |    15.5402   2.597644     5.98   0.000
------------------------------------------------------------------

. margins sex@group, contrast(pv nowald)

Contrasts of adjusted predictions               Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------
                    |            Delta-method
                    |   Contrast   Std. Err.      t    P>|t|
--------------------+---------------------------------------
          sex@group |
(female vs base) 1  |   21.62507   1.509999    14.32   0.000
(female vs base) 2  |   16.77349   1.222358    13.72   0.000
(female vs base) 3  |    15.5402   2.597644     5.98   0.000
------------------------------------------------------------

R

To do: All pairwise comparisons

summary(mod1 <- lm(y ~ group*sex, data = m))

Call:
lm(formula = y ~ group * sex, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-72.029 -13.285   0.464  13.741  67.964 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        50.611      1.368  36.998  < 2e-16 ***
group2             11.374      1.573   7.230 6.12e-13 ***
group3             21.619      1.588  13.610  < 2e-16 ***
sexfemale          21.625      1.510  14.321  < 2e-16 ***
group2:sexfemale   -4.852      1.943  -2.497   0.0126 *  
group3:sexfemale   -6.085      3.005  -2.025   0.0429 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared:  0.1343,    Adjusted R-squared:  0.1329 
F-statistic: 92.91 on 5 and 2994 DF,  p-value: < 2.2e-16
emmeans::contrast(emmeans::emmeans(mod1, ~ sex | group), "pairwise")
group = 1:
 contrast      estimate   SE   df t.ratio p.value
 male - female    -21.6 1.51 2994 -14.321 <.0001 

group = 2:
 contrast      estimate   SE   df t.ratio p.value
 male - female    -16.8 1.22 2994 -13.722 <.0001 

group = 3:
 contrast      estimate   SE   df t.ratio p.value
 male - female    -15.5 2.60 2994  -5.982 <.0001 

Plotting

Code

Stata

. regress y i.sex##c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(3, 2996)      =    139.91
       Model |  170983.675         3  56994.5583   Prob > F        =    0.0000
    Residual |  1220449.33     2,996   407.35959   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1220
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.183

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.92308   2.789012     5.35   0.000     9.454508    20.39165
         age |  -.4929608   .0480944   -10.25   0.000    -.5872622   -.3986595
             |
   sex#c.age |
     female  |  -.0224116   .0674167    -0.33   0.740    -.1545994    .1097762
             |
       _cons |   82.36936   1.812958    45.43   0.000      78.8146    85.92413
------------------------------------------------------------------------------

. margins sex, at(age = (20(10)60))

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : age             =          20

2._at        : age             =          30

3._at        : age             =          40

4._at        : age             =          50

5._at        : age             =          60

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     _at#sex |
     1#male  |   72.51015   .9336568    77.66   0.000     70.67947    74.34082
   1#female  |     86.985    1.22567    70.97   0.000     84.58175    89.38824
     2#male  |   67.58054   .5984013   112.94   0.000     66.40722    68.75386
   2#female  |   81.83127   .8228617    99.45   0.000     80.21784     83.4447
     3#male  |   62.65093   .5541361   113.06   0.000     61.56441    63.73746
   3#female  |   76.67755   .5461908   140.39   0.000      75.6066    77.74849
     4#male  |   57.72132   .8477402    68.09   0.000     56.05911    59.38353
   4#female  |   71.52382    .604927   118.24   0.000     70.33771    72.70994
     5#male  |   52.79171   1.262091    41.83   0.000     50.31706    55.26637
   5#female  |    66.3701   .9380502    70.75   0.000     64.53081    68.20939
------------------------------------------------------------------------------

. marginsplot, recastci(rarea)

  Variables that uniquely identify margins: age sex

R

library(interactions)
summary(mod1 <- lm(y ~ sex*age, data = m))

Call:
lm(formula = y ~ sex * age, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-71.817 -13.848  -0.116  13.758  75.263 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   82.36936    1.81296  45.434  < 2e-16 ***
sexfemale     14.92308    2.78901   5.351 9.42e-08 ***
age           -0.49296    0.04809 -10.250  < 2e-16 ***
sexfemale:age -0.02241    0.06742  -0.332     0.74    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.122 
F-statistic: 139.9 on 3 and 2996 DF,  p-value: < 2.2e-16
interactions::interact_plot(mod1, pred = age, modx = sex, interval = TRUE)